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count processes 

MICHAEL H. NEUMANN 

Friedrich-S chiller- Universitat Jena, Institut fur Stochastik, Ernst- Abbe- Platz 2, 07743 Jena, 
Germany. E-mail: michael.neumann@uni-jena.de 

We consider a class of observation-driven Poisson count processes where the current value of the 
accompanying intensity process depends on previous values of both processes. We show under 
a contractive condition that the bivariate process has a unique stationary distribution and that 
a stationary version of the count process is absolutely regular. Moreover, since the intensities 
can be written as measurable functionals of the count variables, we conclude that the bivariate 
process is ergodic. As an important application of these results, we show how a test method 
previously used in the case of independent Poisson data can be used in the case of Poisson count 
processes. 

Keywords: absolute regularity; ergodicity; integer-valued process; mixing; Poisson count 
process; test 

1. Introduction 

The modeling and the analysis of count data has received increasing attention during 
the last decade. There are possible applications in various fields, such as biometrics, 
econometrics and finance; see Davis, Dunsmuir and Wang [9] and Davis and Wu [10] for 
examples. A comprehensive account of models for time series of counts is given in Kedem 
and Fokianos [20], Chapter 4. In the majority of cases the count variables are assumed 
to be Poisson distributed, conditioned on the past and perhaps some additional regressor 
variables. Models for count data consist of at least two processes: an observable process of 
counts and an accompanying intensity process that is usually not observed. Cox [6] and 
later Davis, Dunsmuir and Wang [9] classified these models into parameter-driven and 
observation-driven specifications. In the first case, the accompanying intensity process 
evolves independently of the past history of the observation process while, in the latter 
case, the values on the intensity process do depend on past observations. The major 
aim of this paper is to derive important properties such as stationarity, mixing and 
ergodicity for a certain class of observation-driven processes. Davis, Dunsmuir and Wang 
[9] mentioned that, in contrast to parameter-driven models where these properties are 
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inherited by the observation process from the accompanying intensity process, there is 
much less theory available in the case of observation-driven models. Actually, ergodicity 
has been shown so far in a few special cases only - see Grunwald, Hyndman, Tedesco and 
Tweedie [19], Streett [27], Davis, Dunsmuir and Strcctt [8], Zheng and Basawa [28] and 
Fokianos, Rahbek and Tj0stheim [17]. In these papers, the authors could use classical 
Markov chain theory. 

In the present paper, we study a model where the observations N t are Poisson dis- 
tributed, conditioned on the past, with an intensity At depending on one lagged value of 
the count process and the intensity process; that is, At = /(At-i, N t -i), for some func- 
tion /. Models of this type have been considered before by Rydberg and Shephard [26], 
Streett [27], Davis, Dunsmuir and Strcctt [8], Fokianos, Rahbek and Tj0stheim [17] and 
Fokianos and Tj0stheim [18]. An important aspect is that such models allow for an au- 
torcgressive (AR) feedback mechanism in the intensity process and it can be expected 
that this leads to a parsimonious parametrization. For clarity of exposition, we do not 
include additional regressor variables that are often also incorporated in specifications of 
the intensity. Under a contractive condition on /, we state in Section 2 that the bivari- 
ate process (( jV* , At))teN has a unique stationary distribution. The proof of this result 
is based on a simple construction, where independently started versions of the process 
are coupled in such a way that they converge to each other. Section 3 contains the main 
results. For a stationary version of the process, we prove absolute regularity (/3-mixing) 
of the univariate count process. Since the latter process is not Markovian, we cannot rely 
on standard arguments from Markov chain theory; rather, we use coupling arguments 
to derive this result. We also discuss an example that shows that the bivariate process 
((AT t , At))tgz and even the intensity process (At)t<=z are not absolutely regular in gen- 
eral. However, since the intensities can be written as measurable functionals of the count 
variables, we conclude from the mixing property of the count process that the bivariate 
process is crgodic. In Section 4, we propose a test for a particular specification of the 
intensity process. We use a test statistic that has been applied before by several authors 
in connection with independent Poisson random variables. Using the ergodicity result 
from Section 3, we can show that the test statistic is asymptotically normal. All proofs 
are deferred to a final Section 5. 

2. Stationarity of the bivariate process 

We assume that (N t )teN is a time series of counts, accompanied by an intensity process 
(At)teN- Denote by B^' = cr(Ai, . . . , A t , iVi, . . . , N t ) the cr-field generated by the past and 
present values of the two processes at time t. We assume throughout that 



Nt\B, 



,N.\ 
t-l 



Poisson(At) 



(2.1) 



and 



A t = /(At_i,A r t-i) 



(2.2) 
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for some function / : [0, oo) x N — >• [0, oo) (No = NU {0}). For the time being, the starting 
value Ai may be random or non-random. It follows from the structure of the model that 
= a(Ax, N\, . . . ,Nt-i) and that the bivariate process ((N t ,Xt))te® forms a homoge- 
neous Markov chain. Throughout this paper we will assume that the function / satisfies 
the following contractive condition: 

\f(X,y)~f(X',y')\<K 1 \X-X'\+K 2 \y-y'\ VA, A' > 0,Wy, y £ No, (2.3) 

where k\ and k 2 are non-negative constants with n := n\ + n 2 < 1. This includes as 
a special case a linear specification where A t = 9 + 0\Xt-i + 9 2 N t -i and 9o,9\,9 2 are 
non- negative constants with 0\ + 9 2 < 1 . Rydberg and Shephard [26] proposed such a 
model for describing the number of trades on the New York Stock Exchange in certain 
time intervals and called it the BIN(1, 1) model. Stationarity and other properties for this 
model were derived by Streett [27] and Ferland, Latour and Oraichi [16], who referred to it 
as the INGARCH(1, 1) model, and Fokianos, Rahbek and Tj0stheim [17]. The generality 
of Condition (2.3) is chosen to include nonlinear specifications such as the exponential 
AR model proposed in Fokianos, Rahbek and Tj0stheim [17]. In this case, the intensity 
function is specified as /(A, y) = (a + ccxp(— 7A 2 ))A + by, where a, b, c, 7 > 0. It follows 
from -§^f(X, y) = b and |^/(A, y)\ < a + c that (2.3) is fulfilled if a + b + c < 1. 
Note that (2.3) implies that 

f(X,y) <f (0,0) + Kl \ + K 2 y. (2.4) 

It follows from (2.4) that £?(A t |A t _i) < /(0, 0) + nX t -i, which leads to 

E(N t \Xi) - S(Ai|Ai) < /(0,0) 1 7 W * + K t_1 Ai. (2.5) 

1 — K 

Hence, the bivariate chain ((N t , At))teN is bounded in probability on average. Moreover, 
it follows from (2.3) that, for any open set O G 2 N ° ® B, the transition probabilities 
P((N t ,Xt) £ 0\(Nt-i, Xt-x) = ■) are a continuous, and therefore also a lower scmicon- 
tinuous, function. Hence, the Markov chain is a weak Feller chain and it follows from 
Theorem 12.1.2(h) in Meyn and Tweedie [23] that there exists at least one stationary dis- 
tribution. Uniqueness of this stationary distribution, however, requires more than (2.4) 
and will follow from the contractive condition (2.3). The following theorem summarizes 
this and a few other useful facts. 

Theorem 2.1. Suppose that the bivariate chain ((Nt,Xt))teH obeys (2.1)-(2.3). Then 

(i) There exists a unique stationary distribution ir. 

(ii) If (Nx, Ai) ~ 7T } then EXi < 00. 

(iii) J//(0,0) = 0, then tt({0,0}) = 1. If f (0,0) > 0, then n({y,X}) < 1 for ally£N , 

Ae [0,00). 

Remark 1. Using the contractive property (2.3), we will show in the proof of Theo- 
rem 2.1 that the n-step transition laws P((N n+ i, X n+ i) £ ■ | (iVi , Ai ) = x) converge to a 
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common limit 7r not depending on the starting value x, where it is a probability measure. 
This will imply that ir is the unique stationary distribution. 

There are alternative ways to prove Theorem 2.1. Introducing a sequence of inde- 
pendent "innovations" (Ut)teN with Ut ~ Uniform[0, 1], we could re-express the process 
values as 

(N t+1 ,X t+1 )=G((N t ,X t ),U t+1 ):=(F^ XtiNt) (U t+1 ),f(X t ,N t )), 

where F\ denotes the cumulative distribution function of a Poisson(A) distribution. This 
gives us a representation of {(Nt, Xt))ten as a randomly perturbed dynamical system with 
independent and identically distributed innovations. In such a context and under a con- 
tractive condition similar to our (2.3), Diaconis and Frccdman [13] also proved existence 
and uniqueness of a stationary distribution. To this end, these authors used backward 
iterations to identify a random variable which has the desired stationary distribution. 
The approach used here is more direct and uses also elements of standard Markov chain 
theory as described in Meyn and Tweedie [23]. Finally, we would like to mention that 
Lasota and Mackey [21] also proved the existence of a unique stationary distribution 
under conditions similar to our (2.3) and (2.4); see, in particular, equations (2) and (3) 
in their paper. Their proof contains similar ingredients to our proof; however, it is more 
analytic in nature while we establish a coupling to represent the convergence facts in a 
simple stochastic language. 

3. Absolute regularity of the count process and 
ergodicity 

In this section, we state the main results of our paper, absolute regularity of the count 
process and, as a consequence, ergodicity of the bivariate process ((N t , \t))t- Actually, 
Grunwald, Hyndman, Tcdesco and Tweedie [19], Case II of Proposition 3, Strcett [27] 
and Davis, Dunsmuir and Streett [8] proved ergodicity in special cases. However, they 
made heavy use of the particular form of their link function / and could show that 
Doeblin's condition is fulfilled. Hence, they could employ Markov chain technology to 
prove ergodicity. We cannot use this approach in the case considered here since Doeblin's 
condition will not be satisfied in general. Another commonly used approach to proving 
ergodicity, which is not restricted to the case of Markov chains, consists in proving first 
strong mixing as a sufficient condition for ergodicity. It turns out, however, that the 
bivariate process ((N t ,\t))t is not strongly mixing in general; a counterexample is given 
in Remark 3 below. The problem lies in the discreteness of the distribution of the "in- 
novations" Nt while the A t take values on a continuous scale. This makes the commonly 
used coupling approach to proving mixing properties of Markov chains impossible. To 
give some idea why a discrete distribution of the innovations may cause problems, we 
recall the well-known example of a stationary AR(1) process, X t = 0X t -i + £t, where 
the innovations are independent with P(st = 1) = P(st = — 1) = 1/2 and < \6\ < 1/2. 
This process has a stationary distribution supported on [—2, 2]. It follows from the above 
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model equation that X t has, with probability 1, the same sign as Sf Hence, we could 
perfectly recover X t -i, Xt-2, ■ ■ ■ from X t , which clearly excludes any of the common 
mixing properties. (Rosenblatt [25] mentioned the fact that a process similar to (X t )tez 
is purely deterministic going backwards in time. A rigorous proof that it is not strong 
mixing was given by Andrews [1].) On the other hand, we can prove absolute regularity 
for the (univariate) count process (N t ) t . For this purpose, the discrete nature of the dis- 
tribution of the N t does not harm. To see why, note that we have either 7r({0,0}) = 1 
or P(A t _i > or A f > 0) = 1; see the proof of part (hi) of Theorem 2.1. Therefore, the 
support of the conditional distribution of N t +2 given Nt, Nt—i, ... is equal to the support 
of the stationary distribution of the Nt and we can actually construct a successful cou- 
pling. Since absolute regularity implies strong mixing, we immediately obtain ergodicity 
of the count process (N t )t- Moreover, as a by-product of our coupling, we see that the 
random intensities At can be expressed as measurable functionals of past variables of the 
count process. Hence, we finally obtain the desired ergodicity of the bivariate process 

m,xt)) t . 

It was stated in Section 2 that the bivariate process ((N t ,\t))t has a unique station- 
ary distribution under the contractive condition (2.3). In this section, we will assume 
throughout that this process is in its stationary regime. Moreover, it proves to be quite 
convenient to have a two-sided stationary version, with time domain Z rather than N, 
which exists by Kolmogorov's extension theorem; see Durrett [15], page 293. Here is the 
main result of the paper. 

Theorem 3.1. Suppose that the bivariate chain ((Nt,Xt))teZ is in its stationary regime 
and obeys (2.1)~(2.3). Then 

(i) The count process (Nt)tez is absolutely regular with coefficients satisfying 

j3(n) <2E\ 1 n n - 1 /(l- 

(ii) There exists a measurable function giN™ := {(rii, n^, . . .): ni € No} — > [0,oo) 
such that X t = g(N t -i, N t -2, ■ ■ ■) holds almost surely. 

(hi) The process ((iV t , At))tgz is ergodic. 
(iv) EX\ < oo. 

Remark 2. In the case of a so-called ING ARCH (1,1) process where At is specified 
as At = 0q + #iAf_i + 82N-1, Fcrland, Latour and Oraichi [16] proved the stronger 
result that all moments of At and Nt are finite. Since it follows from (2.4) that 
At < /(0, 0) + KiAf_i + K2N-1, we conjecture that their result can be generalized by 
simple majorization arguments to our more general framework. However, since higher- 
than-second moments are not needed for the purposes of this paper, we do not make the 
attempt to adapt their proof, which was already quite involved in the special case of a 
linear specification of At- 

Remark 3. Theorem 3.1 states that the count process (N t )t£Z is absolutely regular 
and, therefore, also strongly mixing under condition (2.3). This allows us to conclude 
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that the bivariate process ((N t , X t ))tez is ergodic. However, the process {{N t ,X t ))t£i and 
even the intensity process {Xt)teL alone are not strongly mixing in general. To see this, 
consider the specification /(A, y) = g{X) +y/2, where g is strictly monotone and satisfies 
< ci < g(X) < 0.5 and \g(X) - g(X')\ < c 2 |A - A'| for some c 2 < 0.5 and for all A, A'. 
Then / satisfies our contractive condition (2.3). Using the fact that g{X) <G [ci,0.5), we 
obtain that 2g{X t -i) = 2A f — [2A<], which implies that we can perfectly recover Xt-i once 
we know the value of At. Iterating this argument, we see that we can recover from Xt 
the entire past of the intensity process. Taking into account that the above choice of / 
excludes the case that the intensity process is purely non-random, we conclude that a 
stationary version of {Xt)tez cannot be strongly mixing. 

Remark 4- The primary intention of the author was to devise a method of proving 
ergodicity of certain count processes. This is done, mainly for clarity of presentation, for 
the simple case where the intensity depends only on one lagged value of the count process 
and the intensity process. In contrast to previous work in this area, the coupling approach 
used here does not require Markovianity of the process. The results of this paper, and 
in particular the ergodicity stated in Theorem 3.1, can be generalized to more complex 
models with more than one or even infinitely many lagged variables. Moreover, it seems 
to be possible to include covariates, at least if they are exogencous. These generalizations 
are well beyond the scope of this paper and should be the subject of future research. 

4. A specification test for the intensity function 

There might be good reasons for assuming that the count variables are Poisson dis- 
tributed, conditioned on the past. However, a particular specification for the intensity 
function seems to be more questionable and such a choice should be supported by a 
statistical test. Here we propose a test statistic that was originally designed for testing 
overdispcrsion in the context of i.i.d. observations; see Lee [22] and Cameron and Trivedi 



Assume that we have observations N\, . . . , N n from a stationary process {{Nt, Xt))tei. 
obeying (2.1) and (2.2) and that we want to test the simple hypothesis 



where 6 C R d and the f e satisfy (2.3). 

To motivate a particular test statistic, pretend that we additionally observe the starting 
value Ai of the intensity process. Then we could take, for testing Hq against Hi, the 
statistic 



[5]. 



H - f = fo against H x : f ^ fo 



for some /o satisfying (2.3), or the composite hypothesis 



H : fe{f 9 : 0€Q} against H[: f £ {f e : 6 e 6}, 
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where A" = Ai and, for t = 2, . . . , n, the A° are recursively defined as A° = fo(^t-i> Nt-i)- 
The idea behind this statistic is very simple. If the the intensity function / is correctly 
specified, then A" = A t , which implies E[(N t — A°) 2 — N t ] = and, as stated in Propo- 
sition 4.1 below, T n fl — >Af(0, 2E\f). On the other hand, if / is not correctly specified 
by /o, then the random variables (Nt — At) 2 — Nt are not centered and we can expect 
consistency of the test. 

In the more relevant case of unknown Ai, we replace this by any arbitrarily chosen, 
random or non-random, starting value Ai, then define recursively At = /o(A(_i, Nt—i), 
for t = 2, . . . , n, and take the test statistic 



In the case of testing Hq against H[ , we estimate the parameter 9 by some estimator 9. 
first and take then the test statistic 



Remark 5. In the context of independent observations, Lee [22] and Cameron and 
Trivedi [5] considered a test statistic similar to ours for testing the Poisson hypothesis 
against the alternative that the distribution belongs to the so-called Katz family of 
distributions. This family contains as special cases the Poisson, negative binomial and 
binomial distributions. While the variance equals the mean in the Poisson case, the latter 
two classes contain distributions for which the variance mean ratio is strictly greater or 
less than 1, respectively. Therefore, Lee [22] and Cameron and Trivedi [5] interpreted their 
tests as tests for over- or underdispersion. The same test statistic was also suggested in 
Cox [7]. It was also used by Brannas and Johansson [4] for testing for the existence of a 
latent process in the context of Poisson count models. Again, in the case of independent 
data, Dean and Lawless [11] and Dean [12] came up with adjusted versions of Lee's 
and Cameron and Trivedi's test statistic that have the same limit distribution as the 
unadjusted statistic but are closer to this limit in small samples. 

We will prove that the above statistics, T^o, T n and T n , are asymptotically normal 
with the same limit. This can be most easily done for T n ^ since this statistic is a sum of 
martingale differences that allows us to apply an appropriate central limit theorem. 

Proposition 4.1. Suppose that the bivariate process is stationary and obeys (2.1) and 
(2.2). If Ho is true and /o satisfies the contractive condition (2.3), then 





Here Ai is again any starting value and A* = /g- (Af_i, iVt_i), for t = 2, . . . ,n. 



r„,o^AA(0,2SA 2 ). 
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Next, we will show that T n and T n have the same limit distribution as T Uj q. To this end, 
we will simply show that the difference between the former statistics to T„.o converges 
to zero in probability. This is not surprising at all for T n since it follows from (2.3) that 
|A t — At | < k* _1 |Ai — Ai|. The following lemma shows that A* will also be close to At if 
9 n is a -y^-consistent estimator of 9 and if fg(\,y) is a smooth function in 9. 

Lemma 4.1. Suppose that the bivariate process is stationary and obeys (2.1) and (2.2) 
with f = fg . We assume that 9 n — 9q = Op(r7,~ 1 / 2 ). Furthermore, we assume that there 
exist C < oo, K\,K2 > with n := k± + «2 < 1 such that 

(i) \fe'(\,y) - fe (\y)\<C\\6' - 6 Q \\(\ + y + l) WX,y, 

(ii) |/ e ,(A, y) - fe-(X,y)\ < Ki|A - A| + n 2 \y - y\ 

hold for all 9'eO with \\0' - 9 \\ < 5, for some 5>0. 
Then 

n 

5> t -A t ) 2 = P (l). 

t=i 

We think that the above assumption on the estimator 9 n is a realistic one in many 
cases. It is fulfilled, for example, by the conditional maximum likelihood estimator studied 
in Fokianos, Rahbek and Tj0stheim [17]. 

Theorem 4.1. Suppose that the assumptions of Lemma 4-1 are fulfilled. 
Then 

f n ^N{0,2E\\). 

Remark 6. The same assertion holds true for T n instead of T n since this is obviously a 
special case of that considered in Theorem 4.1. 

Note that the limit distribution of T n still contains the parameter E\\ that is usually 
not known in advance and has to be estimated. We obtain from Lemma 4.1 by the 



Minkowski inequality that \\J n 1 J2t=i — \/ n 1 Y^t=\^H I — y n 1 2"=i ~ ^*) 2 = 
Op(n -1 / 2 ), which leads in conjunction with ergodicity of (Xt)tez to 

n 

l -^ 2 t ^EX\. (4.1) 

t=i 

For a prescribed size a € (0, 1), we propose a test for H' against H[ as 
tp n =I\ | (2/n)^A 2 "j f n >«A 
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where u a = $ _1 (1 — a) denotes the (1 — a)-quantile of the standard normal distribution. 
From Theorem 4.1 and (4.1) we conclude that this test has asymptotically the correct 
size. 

Theorem 4.2. Suppose that the assumptions of Lemma J^.l are fulfilled and that 
/e (0,0) > 0. Then we have, under H' , 

( " \ ~ V2 

f(2/n)£A?J f n AAT(0,l), 

which implies that 

P{ip n = 1) — ► a. 

5. Proofs 

As already mentioned in the text, the main results of this paper, Theorems 2.1 and 3.1, 
are both proved by coupling arguments. Necessary technical prerequisites are summarized 
in the following lemma. 

Lemma 5.1. For arbitrary Ai, A2 > ; we can construct on an appropriate probability 
space X\ ~ Poisson(Ai) and X 2 ~ Poisson(A2) such that 

(i) E\X 1 -X 2 \ = \X l -X 2 \, 

(h) P{x 1 ^x 2 )<\\ x -M- 

Proof. Let, without loss of generality, Ai < A2. We take independent random variables 
Xi Poisson(Ai), Z ~ Poisson(A2 — Ai) and define X2 =X\-\-Z. Then X 2 ~ Poisson(A2), 

E\X x -X 2 \=EZ = \\y-\ 2 \ 

and 

P{X 1 ^ X % ) = P(Z ^0)<EZ=\X 1 -X 2 \. 

□ 

Proof of Theorem 2.1. As mentioned above, we could use the fact that ((N t , Xt))tefi 
is a weak Feller chain that is bounded in probability on average to conclude from Theo- 
rem 12.1.2(h) in Meyn and Tweedie [23] that it has at least one stationary distribution. 
Uniqueness could then eventually be derived from the contraction property (2.3). We 
think, however, that it is more instructive for the reader when a self-contained proof that 
uses arguments closely tied to the particular case at hand is presented. 

Let Pi be the conditional distribution of (N t ,\t) given Ai = A, where A € [0, 00) is 
an arbitrarily chosen but fixed starting value. It follows from (2.5) that the sequence of 
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distributions (PJ;)t g N is tight. Hence, there exists a subsequence {nkjk^n of N such that 
P^ k converges weakly to some probability measure 7r\, as k — > oo. Wc will show that 
this limit does not depend on the starting value A and that the full sequence (P™)„ e pj 
converges. This will immediately imply that Tt\ is a stationary distribution that is unique. 

The latter conclusions will follow after we have derived a few convergence properties of 
the process. To this end, we construct, on an appropriate probability space (H' ,A' ,P'), 
two Markov chains ((Nj., A£))teN and ((N" , A"))t 6 N with transition laws according to (2.1) 
and (2.2) and with starting values X[ and A'/, respectively. We construct these chains 
iteratively. Given A^ and A'/, (i) of Lemma 5.1 allows us to construct N[ and N" in such 
a way that 

E(\Ni-N?\\X> 1 ,X'{) = \X' 1 -X'(\. 
The values of X' 2 and X 2 are then given by equation (2.2) and it follows from (2.3) that 
E(\\> 2 - A2||Ai,Ai) < «i|Ai - K\ + K 2 E(\Ni-Ni'\\X[,X'{) 
= «|Ai-A?|. 

In the next step wc can construct and N% such that E{\N 2 - N% ||Ai, A'^A^A^') = 
| X' 2 — X' 2 ' | , which also implies that 

e(\n>-n!;\\x[,xi;)<k\x[-xi{\. 

Now we can proceed in the same way and construct the pairs (iVg, N^'), (N^, iV"), 

With the above construction, we obtain that 

EQX't-X'tWX^X'O^^K-X'H. (5.1) 

and 

E(\Nl - Nl'WX'^X'O < k*- 1 ]^ - A'/|. (5.2) 

Hence, it follows that (P™, fc )fc S N and (P^)fceN converge for any choice of A^ and A" to 
the same limit, which we denote by 7r in the following. Now we can translate this result to 
a convergence result for the conditional distributions of the Markov chain ((N t , X t ))teN- 
Since the above convergence is uniform in A^ over compact sets and since / as a con- 
tinuous function maps compact subsets of [0,oo) x No to compact subsets of [0,oo), we 
obtain that 

sup\P nk (x,A)-ir(A)\ — > (5.3) 

holds for every compact subset K of No x [0, oo) and every 7r-continuity set A. Here 
P n (x, A) = P((N t + n ,Xt+n) G A\(N t ,X t ) = x) denotes the n-stcp transition probability 
of the bivariate process. Equation (5.3) will allow us to show convergence of the full se- 
quence. For any n € N, let k(n) be the largest integer such that n^i n ) < n. From tightness 
of (P™) n£ N and (5.3), we conclude that 

P£= [ P n "^(x,-)P^ nHn) (dx) => TT for all Ae [0,oo). (5.4) 
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It follows directly from this equation that tt is a stationary distribution. To see this, 
observe that it follows from (5.4) that Q™ := n~ l X]"=i P\ converges weakly to tt. Fur- 
thermore, it also follows that Q 7 ^ := n~ x J2t=i === ' > n ' tnat i s ' 

Q" X (A) tt(A) (5.5) 

k— >oc 

if A is a 7r-continuity set, that is, ir(dA) = 0. If A is an open set, then x H> P 1 (x, A) is a 
continuous and bounded function. Therefore, 

Ql{A)= f P\x,A)Ql(dx) — ► / P\x,A)n(dx). (5.6) 



From (5.5) and (5.6) we obtain that the probability measures tt and J P 1 (x, -)7r(dx) coin- 
cide for all open 7r-continuity sets A. Since these sets are stable under finite intersections 
and generate 2 N ° ® B, we conclude that 



tt(A) = j P 1 (x, A)n(dx) VA e 2 N ° g> B, 



that is, 7r is actually a stationary distribution. Let tt' be an arbitrary distribution. Then 
we obtain by majorized convergence, for any 7r-continuity set A, 

P n (x,A)n'(dx) — ► [ n (A)tt' (dx)=n (A). 



If tt' is a stationary distribution, then we also have that J P n (x, ^4)7r'(da;) = tt'(A), which 
implies that tt = tt' . Hence, (i) is proved. 

We obtain from (2.5) and by Theorem 5.3 in Billingsley [2] that 

E v \ x < liminf £(A*|Ai = 0) < /(0,0)/(l - k), 

t— >oo 

which proves (ii). 

To see (hi), note that /(0,0) = implies by (2.5) that E(\ t \\\ = 0) = holds for all t 
which in turn implies that 7r({0,0}) = 1. On the other hand, if /(0,0) > 0, then we can 
conclude that P{X t -i > or A t > 0) = P(\ t -i > 0) + P(X t -i = 0, A t > 0) = P(X t -i > 0) + 
P(A(_i = 0) = 1. This implies that ((N t ,\t))tS:Z cannot be non-random, as required. □ 

Proof of Theorem 3.1. Let, for -oo < k < I < oo, B^ = a(N k ,. . .,N{). Recall that 
the coefficients of absolute regularity of the count process (N t )t£f$ are defined as 

3iV 



Hence, 



P(n) = E sup IP^IB^^-P^)! 



(3(n)<E sup \P(A\a(\ 1 ,N ,N- 1 ,...))-P(A)\ 
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Furthermore, it follows from (N n , N n+1 , . . .)\a(\i, N , iV_i, . . .) = (N n , N n+1 , . . .)|ct(Ai) 
that 



P(n)<E sup \P{A\*{\i))-P(A)\ 



(5.7) 



Let 2?°° be the a- field in R°° = \ [x\,xii ■■■)'■ G R} generated by the cylinder sets, that 
is, 

6°° = a({B x M°°: Be B k , k G N}). 
We can rewrite (5.7) in terms of the process variables as 



/3{n) < E 



sup \P((N n , N n+1 , . . .) G A|Ai) - P((N n , N n+1 , . . .) G A)\ 
AeB°° 



(5.8) 



We will derive an upper estimate for the right-hand side of (5.8) via a coupling approach 
similar to that in the proof of Theorem 2.1. To this end, we will construct on an appro- 
priate probability space (Cl',A',P') two versions of the bivariate process, ((N[, A' t ))t£H 
and ((iV t ", A"))tgN, where the starting values A^ and A" are independent and distributed 
according to the stationary law n. Since, for any A G B°° , 



P((K, N;: +1 , . . .) g A|Ai) = P((N n ,N n+1 , . . .) G A) 



it follows that 



|P((JV nj JV B+X , . . .) G A|Ai = u) - P((N n ,N n+1 ,. . .) G A) | 

= \p((K, N' n+1 , . . .) g a\x[ = u) - p{{K, . .) g a\\[ = u )\ 

< P((N' n ,N' n+1 ,. . .) £ . . .)|Ai - «). 

Therefore, we obtain that 

0(n) < P((K, . . .) ± . • .)). 



(5.9) 



Hence, to estimate j3(n), we will construct a coupling such that the processes (iVj)tgN 
and (N")t£N coalesce after n steps with a high probability. 

Using exactly the same construction as in the proof of Theorem 2.1, we can successively 
construct pairs (N[,N"), (N^Ng),... such that 

SCK-AMXO^-^-Ai'l. 

From here on we deviate from the approach in the proof of Theorem 2.1, where we con- 
structed all pairs (Nj.,N") such that their mean distance was small. By (ii) of Lemma 5.1, 
we can construct N' and N" such that 



P(K^<|A' 1 ,A' 1 ')< K - 1 |A' 1 -A'/|. 
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If the event {N' n = N%} occurs, then (2.3) reduces to 

\K+i ~ K+i \ < k i\K - Kl 

which allows us to construct the next pair (N^ +1 , N^ +1 ) such that 

P« = <,<+! ^ <' +1 |Ai,A'/) < k^ 1 ^ - A'/|. 
Continuing in the same way, we arrive at 

p{N' n =<',.. .,< +fe _ 1 = KVfe-i.<+fe ^ K'+fcK, a'/) < ^-^a; - a'/|. 

Hence, we finally obtain that 
P{{N' nl N' n+1 ,...)^{Nl 

oo 

= P{N' n ^ <) + £ P « =<'»•••» = . <+fc ^ K+k) (5-10) 

fe=i 

<Co/t"~7(l-«i): 

where Co := E\ X[ — A"| < 2£'Ai < oo. This yields, in conjunction with (5.9), Assertion (i). 

To show (ii), define the functions /i = / and, for d> 2, f d (X;rii, . . . , n d ) = /d-i(/(A, n^); 
rii, . . . , nd-i), where rt\, . . . , n d € No and A > 0. It is clear from (2.2) that 

Xt = f d (\t- d ;N t - 1 ,...,N t . d ). 

It follows from (2.3) that 

E\X t - f d (0; N t -i, N t - d )\ < KiEXt-d- 

Hence, as d— > oo, f d (0;Nt-i,...,Nt-d) converges in L\ to At. By taking an appropri- 
ate subsequence, we also get almost sure convergence. This means that there exists a 
measurable function : Ng° — > [0, oo) such that 

At = foo(N t -i,N t - 2 ,...) almost surely. (5-11) 

By stationarity, (5.11) holds for all t G Z, which proves (ii). 

To show (iii), we first recall the well-known fact that absolute regularity implies strong 
mixing. That is, it follows from (i) that 

a{n)= sup \P(AnB)-P(A)P(B)\ — > 0; (5.12) 

see Doukhan [14], page 20. Furthermore, strong mixing implies ergodicity; see Remark 2.6 
on page 50 in combination with Proposition 2.8 on page 51 in Bradley [3]. Finally, we 
conclude from the representation (5.11) by Proposition 2.10(h) in Bradley [3], page 54, 
that the bivariate process ((N t ,X t ))tez is also ergodic. 
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To prove (iv), we study the asymptotics of the process ((N t ,Xt))tEN obeying (2.1), (2.2) 
and (2.3), which is started with A x = 0. We obtain from (2.4) and E(N?\X t ) = A 2 + X t 
that 



E(\ 2 t \\ t -i) < E((f (0,0) + Kl \ t -i + K 2 N t - 1 y\X t -i) 
-(/(0,0) + K A t _ 1 ) 2 + ^A t _ 1 

for any R > k and appropriate Kq = Kq(k). We choose R G (k, 1). Then we obtain that 

E{\l\Xi) < K + REiXllXx) < K + R(K + «A?). 
Continuing in the same way we arrive at the inequality 

£(A 2 |Ai) < K- (l +« + ■■■ + «*" 2 )- 
Since At — — > Ai, we conclude from Theorem 5.3 in Billingsley [2] that 

EX\ < liminf EX 2 < K /(l - «), 

t— >oo 

which proves (iii). □ 

Proof of Proposition 4.1. We will use the central limit theorem (CLT) for martingale 
difference arrays given on page 171 in Pollard [24]. Wc define the filtration (Bt)teN with 
B t = a(X 1 ,N 1 ,...,N t ), for t = 0,1,..., and we set Z t = (iV t - A t ) 2 - JV t . Since N t \B t -i ~ 
Poisson(A f ), we obtain that 

^(ZtliBt-!) = 

and 

£7(2?|B t _i) = 2A?. 
Hence, it follows from the ergodicity stated in (iii) of Theorem 3.1 that 

1 " 

-Y J E(Z 2 \B t ^)-2EXl 
n * — ' 

t=i 

It remains to verify the conditional Lindeberg condition, 

n 

n- 1 Y,E{Z 2 t I{\Z t /^i\ > e\B t -i)) A Ve > 0. 
t=i 

We have £™ =1 E{Z?I(\Z t /y/n\ > e\B t -i))] = E\ZlIi\Z x l ^fn\ > e)], which tends to 

zero as n — > oo since EX\ < oo implies that EZ\ < oo. Hence, the conditional Lindeberg 
condition is also satisfied and the assertion follows from the CLT mentioned above. □ 
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Proof of Lemma 4.1. Assume for the time being that \\9 n — 6q\\ < S, which allows us 
to conveniently exploit the smoothness assumptions on /#. Then we obtain that 

I A 2 - A 2 1 < | f tn (Ai, N x ) - f 9n (Ai , m) | + \f ?n (Ax , N t ) - f 0o (M , \ 
<Ki|Ai-Ai| + q|^ n -^o||(Ai + iVi + l) 

and 

|A 3 - A 3 | < «i|A 2 - A 2 1 + C\\6 n - 9 a \\{X 2 +N 2 + 1) 

< C\\6 n - 64{(\ 2 + N 2 + 1) + « x (Ai + m + 1)} + «?|Ai - A x ]. 

Continuing in the same way, we arrive at 

|A* — A*l 

<C||? n -e ||{(A t _i+JVt_i + l) 

(5.13) 

+ Ki(At_ 2 + JV t _ 2 + 1) + • ■ ■ + k\ 2 {X l +N l + 1)} 
+ /c*- 1 |A 1 -A 1 |, 

which yields that 

(A t - \ t f 

< 2C 2 \\0 n - M 2 {(A t -i + JV t _i + 1) + m{\ t -2 + N t -2 + 1) + • • • + 4 _2 (Ai + N x + l)} 2 
+ 2 K 2t - 2 (Ai-A!) 2 

holds for alH > 2. Hence, we obtain under \\8 n — 6o\\ <S that 

Yfit - x t f < |(Ai - AO 2 + c 2 \\e n - e \\ 2 (j2(x t + N t + A j. 

The right-hand side is bounded in probability, which proves the assertion. □ 

Proof of Theorem 4.1. We show that the difference between the test statistic T n and 
T„.o tends to zero in probability. This will yield the assertion by Proposition 4.1. We 
have that 



i « 9 " 

fn-^.0 = ^£(A<- A ') 2 + T=E 



(N t -X t )(X t -X t ). (5.14) 



According to Lemma 4.1, the first term on the right-hand side converges to zero in 
probability. The estimation of the second one, however, is more delicate since X t depends 
via 9 n on the whole sample, which means that this term is not a sum of martingale 
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differences. To proceed, we take first any non-random 9' with \\9' — 6q\\ < 5 and consider 
the intensity process given by X[ = Ai and, for t = 2, . . . ,n, X' t = fg>(X' t _ l7 N t -i). We 
obtain in complete analogy to (5.13) that 

ia;-a 4 | 

< Cp' - e \\{(X t -i + Nt-i + 1) + K!(X t -2 + N t -3 + 1) + • • • + n\- 2 {X l + iVi + 1)} 
+ K *- 1 |A 1 -A 1 |. 

Therefore, we obtain that 



E 



n 

^ n t=l 



/(|Ai-Ai|<M) 



O(|!0'-0 o ||+n~ 1/2 )- (5.15) 



Since \\9 n — 9q\\ = Op(n 1 ^ 2 ) it suffices to establish (5.15) on a sequence of grids Q n 
on the set {9 E 6: ||0 - 9 \\ < e^n -1 / 2 }, where mesh(£„) < e^- 1 / 2 , #£„ < e^ 1 ' 2 , for 
some null sequence (e n )neN- It follows from (5.15) that 



sup 

O'eSn 



1 " 

^ ( iv t -A t )(A t -A;; 



P (e n ). 



(5.16) 



Moreover, for any value of 9 n with \\9 n — 9q\\ < t n x n 1 I 2 we will find some 9' € Q n with 
Pn-O'W <e«™~ 1/2 . Since 



1 - 

^ ( iV t -A t )(A'-A t 
v t=i 



< 



n 



£(A£-A 4 ) 2 = op(l), 



we obtain, in conjunction with (5.16), that the second term on the right-hand side of 
(5.14) is op(l). This completes the proof. □ 
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